165 research outputs found
Distance-Based Independence Screening for Canonical Analysis
This paper introduces a new method named Distance-based Independence
Screening for Canonical Analysis (DISCA) to reduce dimensions of two random
vectors with arbitrary dimensions. The objective of our method is to identify
the low dimensional linear projections of two random vectors, such that any
dimension reduction based on linear projection with lower dimensions will
surely affect some dependent structure -- the removed components are not
independent. The essence of DISCA is to use the distance correlation to
eliminate the "redundant" dimensions until infeasible. Unlike the existing
canonical analysis methods, DISCA does not require the dimensions of the
reduced subspaces of the two random vectors to be equal, nor does it require
certain distributional assumption on the random vectors. We show that under
mild conditions, our approach does undercover the lowest possible linear
dependency structures between two random vectors, and our conditions are weaker
than some sufficient linear subspace-based methods. Numerically, DISCA is to
solve a non-convex optimization problem. We formulate it as a
difference-of-convex (DC) optimization problem, and then further adopt the
alternating direction method of multipliers (ADMM) on the convex step of the DC
algorithms to parallelize/accelerate the computation. Some sufficient linear
subspace-based methods use potentially numerically-intensive bootstrap method
to determine the dimensions of the reduced subspaces in advance; our method
avoids this complexity. In simulations, we present cases that DISCA can solve
effectively, while other methods cannot. In both the simulation studies and
real data cases, when the other state-of-the-art dimension reduction methods
are applicable, we observe that DISCA performs either comparably or better than
most of them. Codes and an R package can be found in GitHub
https://github.com/ChuanpingYu/DISCA
Adaptive multiscale detection of filamentary structures in a background of uniform random points
We are given a set of points that might be uniformly distributed in the
unit square . We wish to test whether the set, although mostly
consisting of uniformly scattered points, also contains a small fraction of
points sampled from some (a priori unknown) curve with -norm
bounded by . An asymptotic detection threshold exists in this problem;
for a constant , if the number of points sampled from the
curve is smaller than , reliable detection
is not possible for large . We describe a multiscale significant-runs
algorithm that can reliably detect concentration of data near a smooth curve,
without knowing the smoothness information or in advance,
provided that the number of points on the curve exceeds
. This algorithm therefore has an optimal
detection threshold, up to a factor . At the heart of our approach is
an analysis of the data by counting membership in multiscale multianisotropic
strips. The strips will have area and exhibit a variety of lengths,
orientations and anisotropies. The strips are partitioned into anisotropy
classes; each class is organized as a directed graph whose vertices all are
strips of the same anisotropy and whose edges link such strips to their ``good
continuations.'' The point-cloud data are reduced to counts that measure
membership in strips. Each anisotropy graph is reduced to a subgraph that
consist of strips with significant counts. The algorithm rejects
whenever some such subgraph contains a path that connects many consecutive
significant counts.Comment: Published at http://dx.doi.org/10.1214/009053605000000787 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning
We propose an adjusted Wasserstein distributionally robust estimator -- based
on a nonlinear transformation of the Wasserstein distributionally robust (WDRO)
estimator in statistical learning. This transformation will improve the
statistical performance of WDRO because the adjusted WDRO estimator is
asymptotically unbiased and has an asymptotically smaller mean squared error.
The adjusted WDRO will not mitigate the out-of-sample performance guarantee of
WDRO. Sufficient conditions for the existence of the adjusted WDRO estimator
are presented, and the procedure for the computation of the adjusted WDRO
estimator is given. Specifically, we will show how the adjusted WDRO estimator
is developed in the generalized linear model. Numerical experiments demonstrate
the favorable practical performance of the adjusted estimator over the classic
one
Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks
This paper studies the binary classification of unbounded data from generated under Gaussian Mixture Models (GMMs) using deep ReLU neural
networks. We obtain \unicode{x2013} for the first time \unicode{x2013}
non-asymptotic upper bounds and convergence rates of the excess risk (excess
misclassification error) for the classification without restrictions on model
parameters. The convergence rates we derive do not depend on dimension ,
demonstrating that deep ReLU networks can overcome the curse of dimensionality
in classification. While the majority of existing generalization analysis of
classification algorithms relies on a bounded domain, we consider an unbounded
domain by leveraging the analyticity and fast decay of Gaussian distributions.
To facilitate our analysis, we give a novel approximation error bound for
general analytic functions using ReLU networks, which may be of independent
interest. Gaussian distributions can be adopted nicely to model data arising in
applications, e.g., speeches, images, and texts; our results provide a
theoretical verification of the observed efficiency of deep neural networks in
practical classification problems
Learning Ability of Interpolating Deep Convolutional Neural Networks
It is frequently observed that overparameterized neural networks generalize
well. Regarding such phenomena, existing theoretical work mainly devotes to
linear settings or fully-connected neural networks. This paper studies the
learning ability of an important family of deep neural networks, deep
convolutional neural networks (DCNNs), under both underparameterized and
overparameterized settings. We establish the first learning rates of
underparameterized DCNNs without parameter or function variable structure
restrictions presented in the literature. We also show that by adding
well-defined layers to a non-interpolating DCNN, we can obtain some
interpolating DCNNs that maintain the good learning rates of the
non-interpolating DCNN. This result is achieved by a novel network deepening
scheme designed for DCNNs. Our work provides theoretical verification of how
overfitted DCNNs generalize well
- …